importing required libraries¶

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
C:\Users\gauri\anaconda3\Lib\site-packages\pandas\core\arrays\masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
  from pandas.core import (

import dataset¶

In [2]:
netflix=pd.read_csv("D:\\Python\\python-projects\\netflix-EDA\\archive (8)\\netflix_titles_2021.csv")
netflix
Out[2]:
show_id type title director cast country date_added release_year rating duration listed_in description
0 s1 Movie Dick Johnson Is Dead Kirsten Johnson NaN United States September 25, 2021 2020 PG-13 90 min Documentaries As her father nears the end of his life, filmm...
1 s2 TV Show Blood & Water NaN Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... South Africa September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, TV Dramas, TV Mysteries After crossing paths at a party, a Cape Town t...
2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... NaN September 24, 2021 2021 TV-MA 1 Season Crime TV Shows, International TV Shows, TV Act... To protect his family from a powerful drug lor...
3 s4 TV Show Jailbirds New Orleans NaN NaN NaN September 24, 2021 2021 TV-MA 1 Season Docuseries, Reality TV Feuds, flirtations and toilet talk go down amo...
4 s5 TV Show Kota Factory NaN Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... India September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, Romantic TV Shows, TV ... In a city of coaching centers known to train I...
... ... ... ... ... ... ... ... ... ... ... ... ...
8802 s8803 Movie Zodiac David Fincher Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... United States November 20, 2019 2007 R 158 min Cult Movies, Dramas, Thrillers A political cartoonist, a crime reporter and a...
8803 s8804 TV Show Zombie Dumb NaN NaN NaN July 1, 2019 2018 TV-Y7 2 Seasons Kids' TV, Korean TV Shows, TV Comedies While living alone in a spooky town, a young g...
8804 s8805 Movie Zombieland Ruben Fleischer Jesse Eisenberg, Woody Harrelson, Emma Stone, ... United States November 1, 2019 2009 R 88 min Comedies, Horror Movies Looking to survive in a world taken over by zo...
8805 s8806 Movie Zoom Peter Hewitt Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... United States January 11, 2020 2006 PG 88 min Children & Family Movies, Comedies Dragged from civilian life, a former superhero...
8806 s8807 Movie Zubaan Mozez Singh Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... India March 2, 2019 2015 TV-14 111 min Dramas, International Movies, Music & Musicals A scrappy but poor boy worms his way into a ty...

8807 rows × 12 columns

Data Exploration¶

In [3]:
#show top 5 rows
netflix.head(5)
Out[3]:
show_id type title director cast country date_added release_year rating duration listed_in description
0 s1 Movie Dick Johnson Is Dead Kirsten Johnson NaN United States September 25, 2021 2020 PG-13 90 min Documentaries As her father nears the end of his life, filmm...
1 s2 TV Show Blood & Water NaN Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... South Africa September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, TV Dramas, TV Mysteries After crossing paths at a party, a Cape Town t...
2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... NaN September 24, 2021 2021 TV-MA 1 Season Crime TV Shows, International TV Shows, TV Act... To protect his family from a powerful drug lor...
3 s4 TV Show Jailbirds New Orleans NaN NaN NaN September 24, 2021 2021 TV-MA 1 Season Docuseries, Reality TV Feuds, flirtations and toilet talk go down amo...
4 s5 TV Show Kota Factory NaN Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... India September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, Romantic TV Shows, TV ... In a city of coaching centers known to train I...
In [4]:
#show bottom 5
netflix.tail(5)
Out[4]:
show_id type title director cast country date_added release_year rating duration listed_in description
8802 s8803 Movie Zodiac David Fincher Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... United States November 20, 2019 2007 R 158 min Cult Movies, Dramas, Thrillers A political cartoonist, a crime reporter and a...
8803 s8804 TV Show Zombie Dumb NaN NaN NaN July 1, 2019 2018 TV-Y7 2 Seasons Kids' TV, Korean TV Shows, TV Comedies While living alone in a spooky town, a young g...
8804 s8805 Movie Zombieland Ruben Fleischer Jesse Eisenberg, Woody Harrelson, Emma Stone, ... United States November 1, 2019 2009 R 88 min Comedies, Horror Movies Looking to survive in a world taken over by zo...
8805 s8806 Movie Zoom Peter Hewitt Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... United States January 11, 2020 2006 PG 88 min Children & Family Movies, Comedies Dragged from civilian life, a former superhero...
8806 s8807 Movie Zubaan Mozez Singh Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... India March 2, 2019 2015 TV-14 111 min Dramas, International Movies, Music & Musicals A scrappy but poor boy worms his way into a ty...
In [5]:
#to show the total number of columns and row
netflix.shape
Out[5]:
(8807, 12)

dataset contain 8807 rows and 12 columns

In [6]:
#to show each column
netflix.columns
Out[6]:
Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')
In [7]:
#to show data types of each columns
netflix.dtypes
Out[7]:
show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object
In [8]:
netflix.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB
In [9]:
#statistical information
netflix.describe()
Out[9]:
release_year
count 8807.000000
mean 2014.180198
std 8.819312
min 1925.000000
25% 2013.000000
50% 2017.000000
75% 2019.000000
max 2021.000000
In [10]:
netflix.describe(include='all')
Out[10]:
show_id type title director cast country date_added release_year rating duration listed_in description
count 8807 8807 8807 6173 7982 7976 8797 8807.000000 8803 8804 8807 8807
unique 8807 2 8807 4528 7692 748 1767 NaN 17 220 514 8775
top s1 Movie Dick Johnson Is Dead Rajiv Chilaka David Attenborough United States January 1, 2020 NaN TV-MA 1 Season Dramas, International Movies Paranormal activity at a lush, abandoned prope...
freq 1 6131 1 19 19 2818 109 NaN 3207 1793 362 4
mean NaN NaN NaN NaN NaN NaN NaN 2014.180198 NaN NaN NaN NaN
std NaN NaN NaN NaN NaN NaN NaN 8.819312 NaN NaN NaN NaN
min NaN NaN NaN NaN NaN NaN NaN 1925.000000 NaN NaN NaN NaN
25% NaN NaN NaN NaN NaN NaN NaN 2013.000000 NaN NaN NaN NaN
50% NaN NaN NaN NaN NaN NaN NaN 2017.000000 NaN NaN NaN NaN
75% NaN NaN NaN NaN NaN NaN NaN 2019.000000 NaN NaN NaN NaN
max NaN NaN NaN NaN NaN NaN NaN 2021.000000 NaN NaN NaN NaN
In [11]:
#finding how many unique values are in dataset
netflix.nunique()
Out[11]:
show_id         8807
type               2
title           8807
director        4528
cast            7692
country          748
date_added      1767
release_year      74
rating            17
duration         220
listed_in        514
description     8775
dtype: int64

Data cleaning¶

In [12]:
#check null values
netflix.isnull()
Out[12]:
show_id type title director cast country date_added release_year rating duration listed_in description
0 False False False False True False False False False False False False
1 False False False True False False False False False False False False
2 False False False False False True False False False False False False
3 False False False True True True False False False False False False
4 False False False True False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ...
8802 False False False False False False False False False False False False
8803 False False False True True True False False False False False False
8804 False False False False False False False False False False False False
8805 False False False False False False False False False False False False
8806 False False False False False False False False False False False False

8807 rows × 12 columns

In [13]:
#to show the count of null values
netflix.isnull().sum()
Out[13]:
show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

miss=netflix.isnull().sum() miss

In [14]:
miss1=(netflix.isnull().sum()/len(netflix))*100
miss1
Out[14]:
show_id          0.000000
type             0.000000
title            0.000000
director        29.908028
cast             9.367549
country          9.435676
date_added       0.113546
release_year     0.000000
rating           0.045418
duration         0.034064
listed_in        0.000000
description      0.000000
dtype: float64
In [15]:
miss=netflix.isnull().sum()
miss
Out[15]:
show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64
In [16]:
#missing values with percent
m=pd.concat([miss,miss1],axis=1,keys=['total','missing%'])
m
Out[16]:
total missing%
show_id 0 0.000000
type 0 0.000000
title 0 0.000000
director 2634 29.908028
cast 825 9.367549
country 831 9.435676
date_added 10 0.113546
release_year 0 0.000000
rating 4 0.045418
duration 3 0.034064
listed_in 0 0.000000
description 0 0.000000
In [17]:
#using heat map to show null values
sns.heatmap(netflix.isnull())
Out[17]:
<Axes: >

From the above output we can see that director , cast ,country columns contains maximum null values. We will see how to deal with them.

So, We Delete director and cast columns because they are not going to use those features right now.

In [18]:
#making copy of dataset to make changes
netflix_copy=netflix.copy()
netflix_copy
netflix_copy.head(5)
Out[18]:
show_id type title director cast country date_added release_year rating duration listed_in description
0 s1 Movie Dick Johnson Is Dead Kirsten Johnson NaN United States September 25, 2021 2020 PG-13 90 min Documentaries As her father nears the end of his life, filmm...
1 s2 TV Show Blood & Water NaN Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... South Africa September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, TV Dramas, TV Mysteries After crossing paths at a party, a Cape Town t...
2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... NaN September 24, 2021 2021 TV-MA 1 Season Crime TV Shows, International TV Shows, TV Act... To protect his family from a powerful drug lor...
3 s4 TV Show Jailbirds New Orleans NaN NaN NaN September 24, 2021 2021 TV-MA 1 Season Docuseries, Reality TV Feuds, flirtations and toilet talk go down amo...
4 s5 TV Show Kota Factory NaN Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... India September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, Romantic TV Shows, TV ... In a city of coaching centers known to train I...
In [19]:
#droping  na values of director and cast
netflix_copy=netflix_copy.dropna(how='any',subset=['director','cast'])
netflix_copy
Out[19]:
show_id type title director cast country date_added release_year rating duration listed_in description
2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... NaN September 24, 2021 2021 TV-MA 1 Season Crime TV Shows, International TV Shows, TV Act... To protect his family from a powerful drug lor...
5 s6 TV Show Midnight Mass Mike Flanagan Kate Siegel, Zach Gilford, Hamish Linklater, H... NaN September 24, 2021 2021 TV-MA 1 Season TV Dramas, TV Horror, TV Mysteries The arrival of a charismatic young priest brin...
6 s7 Movie My Little Pony: A New Generation Robert Cullen, José Luis Ucha Vanessa Hudgens, Kimiko Glenn, James Marsden, ... NaN September 24, 2021 2021 PG 91 min Children & Family Movies Equestria's divided. But a bright-eyed hero be...
7 s8 Movie Sankofa Haile Gerima Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D... United States, Ghana, Burkina Faso, United Kin... September 24, 2021 1993 TV-MA 125 min Dramas, Independent Movies, International Movies On a photo shoot in Ghana, an American model s...
8 s9 TV Show The Great British Baking Show Andy Devonshire Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho... United Kingdom September 24, 2021 2021 TV-14 9 Seasons British TV Shows, Reality TV A talented batch of amateur bakers face off in...
... ... ... ... ... ... ... ... ... ... ... ... ...
8801 s8802 Movie Zinzana Majid Al Ansari Ali Suliman, Saleh Bakri, Yasa, Ali Al-Jabri, ... United Arab Emirates, Jordan March 9, 2016 2015 TV-MA 96 min Dramas, International Movies, Thrillers Recovering alcoholic Talal wakes up inside a s...
8802 s8803 Movie Zodiac David Fincher Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... United States November 20, 2019 2007 R 158 min Cult Movies, Dramas, Thrillers A political cartoonist, a crime reporter and a...
8804 s8805 Movie Zombieland Ruben Fleischer Jesse Eisenberg, Woody Harrelson, Emma Stone, ... United States November 1, 2019 2009 R 88 min Comedies, Horror Movies Looking to survive in a world taken over by zo...
8805 s8806 Movie Zoom Peter Hewitt Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... United States January 11, 2020 2006 PG 88 min Children & Family Movies, Comedies Dragged from civilian life, a former superhero...
8806 s8807 Movie Zubaan Mozez Singh Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... India March 2, 2019 2015 TV-14 111 min Dramas, International Movies, Music & Musicals A scrappy but poor boy worms his way into a ty...

5700 rows × 12 columns

In [20]:
#filling missing values of country,rating,duartion by 'missing'
netflix_copy=netflix_copy.fillna({'country':'missing','duration':'missing','rating':'missing'})
netflix_copy.head(5)
Out[20]:
show_id type title director cast country date_added release_year rating duration listed_in description
2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... missing September 24, 2021 2021 TV-MA 1 Season Crime TV Shows, International TV Shows, TV Act... To protect his family from a powerful drug lor...
5 s6 TV Show Midnight Mass Mike Flanagan Kate Siegel, Zach Gilford, Hamish Linklater, H... missing September 24, 2021 2021 TV-MA 1 Season TV Dramas, TV Horror, TV Mysteries The arrival of a charismatic young priest brin...
6 s7 Movie My Little Pony: A New Generation Robert Cullen, José Luis Ucha Vanessa Hudgens, Kimiko Glenn, James Marsden, ... missing September 24, 2021 2021 PG 91 min Children & Family Movies Equestria's divided. But a bright-eyed hero be...
7 s8 Movie Sankofa Haile Gerima Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D... United States, Ghana, Burkina Faso, United Kin... September 24, 2021 1993 TV-MA 125 min Dramas, Independent Movies, International Movies On a photo shoot in Ghana, an American model s...
8 s9 TV Show The Great British Baking Show Andy Devonshire Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho... United Kingdom September 24, 2021 2021 TV-14 9 Seasons British TV Shows, Reality TV A talented batch of amateur bakers face off in...
In [21]:
netflix_copy.isnull().sum()
Out[21]:
show_id         0
type            0
title           0
director        0
cast            0
country         0
date_added      0
release_year    0
rating          0
duration        0
listed_in       0
description     0
dtype: int64

Data pre-profiling¶

In [22]:
pip install -U ydata-profiling
Requirement already satisfied: ydata-profiling in c:\users\gauri\anaconda3\lib\site-packages (4.10.0)
Requirement already satisfied: scipy<1.14,>=1.4.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.10.1)
Requirement already satisfied: pandas!=1.4.0,<3,>1.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (2.2.2)
Requirement already satisfied: matplotlib<3.10,>=3.5 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (3.7.1)
Requirement already satisfied: pydantic>=2 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (2.9.2)
Requirement already satisfied: PyYAML<6.1,>=5.0.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (6.0)
Requirement already satisfied: jinja2<3.2,>=2.11.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (3.1.2)
Requirement already satisfied: visions[type_image_path]<0.7.7,>=0.7.5 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.7.6)
Requirement already satisfied: numpy<2.2,>=1.16.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.24.3)
Requirement already satisfied: htmlmin==0.1.12 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.1.12)
Requirement already satisfied: phik<0.13,>=0.11.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.12.3)
Requirement already satisfied: requests<3,>=2.24.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (2.31.0)
Requirement already satisfied: tqdm<5,>=4.48.2 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (4.65.0)
Requirement already satisfied: seaborn<0.14,>=0.10.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.12.2)
Requirement already satisfied: multimethod<2,>=1.4 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.10)
Requirement already satisfied: statsmodels<1,>=0.13.2 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.14.0)
Requirement already satisfied: typeguard<5,>=3 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (4.3.0)
Requirement already satisfied: imagehash==4.3.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (4.3.1)
Requirement already satisfied: wordcloud>=1.9.3 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.9.3)
Requirement already satisfied: dacite>=1.8 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.8.1)
Requirement already satisfied: numba<1,>=0.56.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.57.0)
Requirement already satisfied: PyWavelets in c:\users\gauri\anaconda3\lib\site-packages (from imagehash==4.3.1->ydata-profiling) (1.4.1)
Requirement already satisfied: pillow in c:\users\gauri\anaconda3\lib\site-packages (from imagehash==4.3.1->ydata-profiling) (9.4.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\gauri\anaconda3\lib\site-packages (from jinja2<3.2,>=2.11.1->ydata-profiling) (2.1.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (1.0.5)
Requirement already satisfied: cycler>=0.10 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (23.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (2.8.2)
Requirement already satisfied: llvmlite<0.41,>=0.40.0dev0 in c:\users\gauri\anaconda3\lib\site-packages (from numba<1,>=0.56.0->ydata-profiling) (0.40.0)
Requirement already satisfied: pytz>=2020.1 in c:\users\gauri\anaconda3\lib\site-packages (from pandas!=1.4.0,<3,>1.1->ydata-profiling) (2022.7)
Requirement already satisfied: tzdata>=2022.7 in c:\users\gauri\anaconda3\lib\site-packages (from pandas!=1.4.0,<3,>1.1->ydata-profiling) (2024.1)
Requirement already satisfied: joblib>=0.14.1 in c:\users\gauri\anaconda3\lib\site-packages (from phik<0.13,>=0.11.1->ydata-profiling) (1.1.1)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\gauri\anaconda3\lib\site-packages (from pydantic>=2->ydata-profiling) (0.6.0)
Requirement already satisfied: pydantic-core==2.23.4 in c:\users\gauri\anaconda3\lib\site-packages (from pydantic>=2->ydata-profiling) (2.23.4)
Requirement already satisfied: typing-extensions>=4.6.1 in c:\users\gauri\anaconda3\lib\site-packages (from pydantic>=2->ydata-profiling) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (2023.7.22)
Requirement already satisfied: patsy>=0.5.2 in c:\users\gauri\anaconda3\lib\site-packages (from statsmodels<1,>=0.13.2->ydata-profiling) (0.5.3)
Requirement already satisfied: colorama in c:\users\gauri\anaconda3\lib\site-packages (from tqdm<5,>=4.48.2->ydata-profiling) (0.4.6)
Requirement already satisfied: attrs>=19.3.0 in c:\users\gauri\anaconda3\lib\site-packages (from visions[type_image_path]<0.7.7,>=0.7.5->ydata-profiling) (22.1.0)
Requirement already satisfied: networkx>=2.4 in c:\users\gauri\anaconda3\lib\site-packages (from visions[type_image_path]<0.7.7,>=0.7.5->ydata-profiling) (3.1)
Requirement already satisfied: six in c:\users\gauri\anaconda3\lib\site-packages (from patsy>=0.5.2->statsmodels<1,>=0.13.2->ydata-profiling) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [23]:
import ydata_profiling as prf
In [24]:
# Assuming `netflix` is your DataFrame
netflix_profile = prf.ProfileReport(netflix)
netflix_profile
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[24]:

In [25]:
netflix_profile.to_file(output_file="netflix21_before_preprocessing.html")
C:\Users\gauri\anaconda3\Lib\site-packages\ydata_profiling\profile_report.py:363: UserWarning: Try running command: 'pip install --upgrade Pillow' to avoid ValueError
  warnings.warn(
Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

Data pre-processing¶

In [26]:
#to show duplicate rows
netflix[netflix.duplicated()]
Out[26]:
show_id type title director cast country date_added release_year rating duration listed_in description
In [27]:
#to show the count of duplicate rows
netflix.duplicated().sum()
Out[27]:
0
In [28]:
#check the sixe after cleaning data set
netflix_copy.shape
Out[28]:
(5700, 12)
In [29]:
#save netflix copy to csv
netflix_copy.to_csv('netflix_clean.csv')

EDA¶

EDA of different questions

In [30]:
netflix.nunique()
Out[30]:
show_id         8807
type               2
title           8807
director        4528
cast            7692
country          748
date_added      1767
release_year      74
rating            17
duration         220
listed_in        514
description     8775
dtype: int64
  1. What different types of show or movie are uploaded on Netflix?
In [31]:
netflix_copy.groupby('type')['title'].count()
Out[31]:
type
Movie      5522
TV Show     178
Name: title, dtype: int64

there are 5522 types of movies and 178 types of TV Show

2.Most watched shows on the Netflix?

In [32]:
netflix_copy.type.value_counts().to_frame('value_count')
Out[32]:
value_count
type
Movie 5522
TV Show 178
In [33]:
sns.countplot(x=netflix_copy['type'])
Out[33]:
<Axes: xlabel='type', ylabel='count'>

we can see that here moves are more watched as compare to tv shows on netflix

In [34]:
value_count=[5522,178]
type_show=['movies','TV show']
plt.pie(value_count,labels=type_show,autopct="%2.2f%%")
plt.show()

3 what are different types of rating defined by netflix

In [35]:
netflix_copy['rating'].nunique()
Out[35]:
18
In [36]:
sns.countplot(x=netflix_copy['rating'])
plt.xticks(rotation=90)
Out[36]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17]),
 [Text(0, 0, 'TV-MA'),
  Text(1, 0, 'PG'),
  Text(2, 0, 'TV-14'),
  Text(3, 0, 'PG-13'),
  Text(4, 0, 'TV-PG'),
  Text(5, 0, 'TV-Y'),
  Text(6, 0, 'R'),
  Text(7, 0, 'TV-G'),
  Text(8, 0, 'TV-Y7'),
  Text(9, 0, 'G'),
  Text(10, 0, 'NC-17'),
  Text(11, 0, '74 min'),
  Text(12, 0, '84 min'),
  Text(13, 0, '66 min'),
  Text(14, 0, 'NR'),
  Text(15, 0, 'TV-Y7-FV'),
  Text(16, 0, 'UR'),
  Text(17, 0, 'missing')])

Audiance prefer mostly TV-MA & TV-14 and less preference NC-17 as rating

there are total 18 types of rating on netflix

4 Show only the title of all TV shows that were released in India only.

In [37]:
netflix[(netflix['type']== 'TV Show') & (netflix['country']=='India')]['title']
Out[37]:
4                             Kota Factory
39                            Chhota Bheem
50                           Dharmakshetra
66           Raja Rasoi Aur Anya Kahaniyan
69          Stories by Rabindranath Tagore
                       ...                
8173                             Thackeray
8235                           The Calling
8321    The Golden Years with Javed Akhtar
8349                The House That Made Me
8775                       Yeh Meri Family
Name: title, Length: 79, dtype: object
In [38]:
netflix[(netflix['type']== 'TV Show') & (netflix['country']=='India')]['title'].count()
Out[38]:
79

there are total 79 tvshows that were release in india only

4.Show top 10 director, who gave the highest number of TV shows & Movies to Netflix?

In [39]:
netflix['director'].value_counts().head(10)
Out[39]:
director
Rajiv Chilaka             19
Raúl Campos, Jan Suter    18
Marcus Raboy              16
Suhas Kadav               16
Jay Karas                 14
Cathy Garcia-Molina       13
Martin Scorsese           12
Youssef Chahine           12
Jay Chapman               12
Steven Spielberg          11
Name: count, dtype: int64
In [40]:
netflix['director'].value_counts().head(10).plot(kind='bar')
Out[40]:
<Axes: xlabel='director'>

5.How many movies got the "TV-14" rating in the caneda?

In [46]:
netflix[(netflix['type'] == 'Movie') & (netflix['rating'] == 'TV-14') & (netflix['country']=='Canada')].shape
Out[46]:
(13, 12)

There are 13 movies got the "TV-14" rating in the caneda.

insights based on EDA:

1)there are 5522 types of movies and 178 types of TV Show uploded on netflix

2)we can see that here moves are more watched as compare to tv shows on netflix

3)there are total 18 types of rating on netflix. Audiance prefer mostly TV-MA & TV-14 and less preference is for  NC-17 as per rating

4)There are 13 movies got the "TV-14" rating in the caneda.
In [ ]: